refactor: state schema with per-resource content hashes#19
Open
dhruva-reddy wants to merge 1 commit intodhruva-reddy/feat/sim-runnerfrom
Open
refactor: state schema with per-resource content hashes#19dhruva-reddy wants to merge 1 commit intodhruva-reddy/feat/sim-runnerfrom
dhruva-reddy wants to merge 1 commit intodhruva-reddy/feat/sim-runnerfrom
Conversation
This was referenced May 1, 2026
Contributor
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 1, 2026
8468fc9 to
4e55f1f
Compare
c9cf252 to
10c101c
Compare
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:
1. Assistant names (or eval names) longer than 40 chars (silent cap).
2. Structured-output ↔ assistant lockstep mismatch — one side declares
the relationship, the other doesn't, dashboard ends up inconsistent.
3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
with two identical headers stacked, agent follows both).
4. `maxTokens` set lower than the JSON-schema size of the attached
tools' arguments — assistant looks fine on push, bricks on first
tool-using call.
5. Voice fields nested wrong for the provider (`voice.speed` on
Cartesia, where it lives at `voice.generationConfig.speed`).
**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.
**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.
---
Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.
Validators implemented:
1. Name length cap (40 chars). Walks every assistant.name and every
evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
assistant_ids, checks the named assistant's structuredOutputIds
mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
stability / similarityBoost / enableSsmlParsing (point at
generationConfig.* / drop the field). 11labs rejects
generationConfig (it's a Cartesia path). Closes #9 (engine half).
- src/validate.ts (NEW): validateResources(loadedResources) returning
ValidationFinding[] with severity / type / resourceId / rule / message
/ fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
push.ts so the lint runs against exactly what would ship. Exit non-zero
on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
covering all five checks.
Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
4e55f1f to
346fbf7
Compare
10c101c to
1efe5ec
Compare
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:
1. Assistant names (or eval names) longer than 40 chars (silent cap).
2. Structured-output ↔ assistant lockstep mismatch — one side declares
the relationship, the other doesn't, dashboard ends up inconsistent.
3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
with two identical headers stacked, agent follows both).
4. `maxTokens` set lower than the JSON-schema size of the attached
tools' arguments — assistant looks fine on push, bricks on first
tool-using call.
5. Voice fields nested wrong for the provider (`voice.speed` on
Cartesia, where it lives at `voice.generationConfig.speed`).
**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.
**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.
---
Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.
Validators implemented:
1. Name length cap (40 chars). Walks every assistant.name and every
evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
assistant_ids, checks the named assistant's structuredOutputIds
mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
stability / similarityBoost / enableSsmlParsing (point at
generationConfig.* / drop the field). 11labs rejects
generationConfig (it's a Cartesia path). Closes #9 (engine half).
- src/validate.ts (NEW): validateResources(loadedResources) returning
ValidationFinding[] with severity / type / resourceId / rule / message
/ fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
push.ts so the lint runs against exactly what would ship. Exit non-zero
on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
covering all five checks.
Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
346fbf7 to
7e5eb7f
Compare
1efe5ec to
92c0312
Compare
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:
1. Assistant names (or eval names) longer than 40 chars (silent cap).
2. Structured-output ↔ assistant lockstep mismatch — one side declares
the relationship, the other doesn't, dashboard ends up inconsistent.
3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
with two identical headers stacked, agent follows both).
4. `maxTokens` set lower than the JSON-schema size of the attached
tools' arguments — assistant looks fine on push, bricks on first
tool-using call.
5. Voice fields nested wrong for the provider (`voice.speed` on
Cartesia, where it lives at `voice.generationConfig.speed`).
**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.
**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.
---
Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.
Validators implemented:
1. Name length cap (40 chars). Walks every assistant.name and every
evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
assistant_ids, checks the named assistant's structuredOutputIds
mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
stability / similarityBoost / enableSsmlParsing (point at
generationConfig.* / drop the field). 11labs rejects
generationConfig (it's a Cartesia path). Closes #9 (engine half).
- src/validate.ts (NEW): validateResources(loadedResources) returning
ValidationFinding[] with severity / type / resourceId / rule / message
/ fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
push.ts so the lint runs against exactly what would ship. Exit non-zero
on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
covering all five checks.
Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
dhruva-reddy
added a commit
that referenced
this pull request
May 2, 2026
**Problem.** The Vapi API rejects bad configs at PATCH time with terse
400s ("property speed should not exist") — and by then the push has
already partially completed against other resources. We watched the
same five classes of mistake hit production over and over:
1. Assistant names (or eval names) longer than 40 chars (silent cap).
2. Structured-output ↔ assistant lockstep mismatch — one side declares
the relationship, the other doesn't, dashboard ends up inconsistent.
3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt
with two identical headers stacked, agent follows both).
4. `maxTokens` set lower than the JSON-schema size of the attached
tools' arguments — assistant looks fine on push, bricks on first
tool-using call.
5. Voice fields nested wrong for the provider (`voice.speed` on
Cartesia, where it lives at `voice.generationConfig.speed`).
**What this fix does.** Five client-side validators, all running off
the same `LoadedResources` shape that `push.ts` would actually ship —
so the lint runs against exactly what would be pushed, no separate
parser to drift. Surfaces as warnings by default (one bad spec doesn't
block an otherwise-good push); promote to abort with `--strict`. Run
standalone via `npm run validate -- <org>`.
**Outcome you'll notice.** Most schema-class mistakes get caught
locally in seconds instead of mid-push 400s. Voice provider field
mismatch gets a specific message pointing at the right path. CI can
add `npm run push -- <env> --strict` as a gate before any deploy.
---
Catch the classes of errors that today only surface when the API returns
a 400 mid-push. The push pipeline runs validation in warn-only mode by
default; --strict promotes errors to a blocking abort before any API
call. Standalone runner via `npm run validate -- <org>`.
Validators implemented:
1. Name length cap (40 chars). Walks every assistant.name and every
evaluations[].structuredOutput.name in scenarios. Closes #18.
2. SO ↔ assistant bidirectional lockstep. For every SO file's
assistant_ids, checks the named assistant's structuredOutputIds
mirrors it; reverse direction too. Closes #11.
3. Prompt duplication heuristics. Same H1 heading appearing twice,
repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks.
Partial fix for #8 (paste-on-top dashboard duplications).
4. maxTokens floor for tool-using assistants. Computes
floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)))
per attached tool. Warns under floor. Closes #19.
5. Per-provider voice schema. Cartesia rejects top-level speed /
stability / similarityBoost / enableSsmlParsing (point at
generationConfig.* / drop the field). 11labs rejects
generationConfig (it's a Cartesia path). Closes #9 (engine half).
- src/validate.ts (NEW): validateResources(loadedResources) returning
ValidationFinding[] with severity / type / resourceId / rule / message
/ fieldPath. Pure data; safe to test directly.
- src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as
push.ts so the lint runs against exactly what would ship. Exit non-zero
on any error finding.
- src/config.ts: --strict flag.
- src/push.ts: validators run in default-warn mode; --strict aborts.
- package.json: validate script.
- AGENTS.md: document npm run validate and --strict.
- tests/validate.test.ts: per-rule fixtures (golden + bad inputs)
covering all five checks.
Closes improvements.md #11, #18, #19. Resolves engine half of #9.
Partial #8, #20 (heuristic only).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
7e5eb7f to
ffa91d9
Compare
## ELI5
**Problem.** The state file (`.vapi-state.<env>.json`) used to map
*name → UUID* and nothing else. So when push went to update a resource,
the engine had no way to tell whether someone had edited the resource
on the dashboard since you last pulled — there was nothing to compare
to. This is the root cause of "drift detection isn't possible," "real
rollback isn't possible," and "scoped pushes can't be precise about
what they touched": the engine has no per-resource memory of *what
was there before*.
**What this fix does.** Widens each state entry from a bare `string`
(the UUID) to a `ResourceState` object carrying:
- `uuid` — the platform UUID (unchanged semantics)
- `lastPulledHash` — sha256 of the platform payload at last pull
- `lastPulledAt` — ISO timestamp
- `lastPushedHash` — sha256 of the last pushed payload
- `platformVersionId` — Stack I, populated when platform exposes one
Every state-reading and state-writing call site is updated. **No new
external behavior ships in this PR alone** — strictly plumbing.
Backwards compatible: legacy state files (the old `string` shape) load
fine, just without hashes until the next pull/push populates them. The
on-disk file isn't rewritten until the next `saveState`, so a "deploy
and immediately rollback" doesn't corrupt state.
**Outcome you'll notice.** This PR alone changes nothing visible. It's
the architectural foundation that **drift detection (Stack G), snapshot
rollback (Stack H), and scoped state writes (Stack J)** all depend on.
After it lands, your next pull populates `lastPulledHash` for every
resource, and the next three PRs unlock real safety guarantees.
---
Architectural pivot. State sections move from Record<string, string>
(name → UUID) to Record<string, ResourceState> carrying:
- uuid: string (the platform UUID, unchanged semantics)
- lastPulledHash?: string (sha256 of canonicalized platform payload)
- lastPulledAt?: string (ISO timestamp)
- lastPushedHash?: string (sha256 of last pushed payload)
- platformVersionId?: string (Stack I — populated when platform exposes one)
This is the architectural prerequisite for drift detection (Stack G),
snapshot rollback (Stack H), optimistic concurrency (Stack I), and scoped
state writes (Stack J). Every state-reading call site is updated, but
NO new external behavior ships in this PR — strictly plumbing.
Backwards compatibility:
- src/state.ts:loadState wraps any legacy bare-string value as
{ uuid: <string> } at load time. Existing customer state files keep
working until their next pull populates hashes. No flag-day migration.
- The on-disk file is NOT rewritten until the next saveState, so a
"deploy and immediately rollback" scenario does NOT corrupt state.
Files:
- src/types.ts: ResourceState type, StateFile sections retyped.
- src/state-serialize.ts: hashPayload (canonicalize + sha256),
asResourceState (legacy migration), upsertState (preserves un-touched
fields when patching).
- src/state.ts: stateUuid helper for the common case;
loadState wraps legacy string entries via migrateSection;
re-exports the helpers for ergonomics.
- src/pull.ts: each pull populates lastPulledHash + lastPulledAt;
credential entries preserve prior metadata when slug+uuid are stable.
- src/push.ts: each PATCH/POST populates lastPushedHash via upsertState.
All `state.X[id]` reads → `?.uuid`. State assignments → upsertState.
- src/cleanup.ts, src/credentials.ts, src/delete.ts, src/eval.ts,
src/resolver.ts, src/call.ts: mechanical updates for the new shape.
Verified by tsc — no leaks where a bare string is still expected.
- tests/state-migration.test.ts: legacy string entries load and
round-trip; mixed legacy + new entries; canonicalize stability;
hashPayload determinism; upsertState preservation semantics.
Closes improvements.md #4 (architectural prerequisite). G/H/I/J unblocked.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
92c0312 to
d4ac9d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

ELI5
Problem. The state file (
.vapi-state.<env>.json) used to mapname → UUID and nothing else. So when push went to update a resource,
the engine had no way to tell whether someone had edited the resource
on the dashboard since you last pulled — there was nothing to compare
to. This is the root cause of "drift detection isn't possible," "real
rollback isn't possible," and "scoped pushes can't be precise about
what they touched": the engine has no per-resource memory of what
was there before.
What this fix does. Widens each state entry from a bare
string(the UUID) to a
ResourceStateobject carrying:uuid— the platform UUID (unchanged semantics)lastPulledHash— sha256 of the platform payload at last pulllastPulledAt— ISO timestamplastPushedHash— sha256 of the last pushed payloadplatformVersionId— Stack I, populated when platform exposes oneEvery state-reading and state-writing call site is updated. No new
external behavior ships in this PR alone — strictly plumbing.
Backwards compatible: legacy state files (the old
stringshape) loadfine, just without hashes until the next pull/push populates them. The
on-disk file isn't rewritten until the next
saveState, so a "deployand immediately rollback" doesn't corrupt state.
Outcome you'll notice. This PR alone changes nothing visible. It's
the architectural foundation that drift detection (Stack G), snapshot
rollback (Stack H), and scoped state writes (Stack J) all depend on.
After it lands, your next pull populates
lastPulledHashfor everyresource, and the next three PRs unlock real safety guarantees.
Architectural pivot. State sections move from Record<string, string>
(name → UUID) to Record<string, ResourceState> carrying:
This is the architectural prerequisite for drift detection (Stack G),
snapshot rollback (Stack H), optimistic concurrency (Stack I), and scoped
state writes (Stack J). Every state-reading call site is updated, but
NO new external behavior ships in this PR — strictly plumbing.
Backwards compatibility:
{ uuid: } at load time. Existing customer state files keep
working until their next pull populates hashes. No flag-day migration.
"deploy and immediately rollback" scenario does NOT corrupt state.
Files:
asResourceState (legacy migration), upsertState (preserves un-touched
fields when patching).
loadState wraps legacy string entries via migrateSection;
re-exports the helpers for ergonomics.
credential entries preserve prior metadata when slug+uuid are stable.
All
state.X[id]reads →?.uuid. State assignments → upsertState.src/resolver.ts, src/call.ts: mechanical updates for the new shape.
Verified by tsc — no leaks where a bare string is still expected.
round-trip; mixed legacy + new entries; canonicalize stability;
hashPayload determinism; upsertState preservation semantics.
Closes improvements.md #4 (architectural prerequisite). G/H/I/J unblocked.
🤖 Generated with Claude Code